Text Data Augmentation for Deep Learning
نویسندگان
چکیده
Abstract Natural Language Processing (NLP) is one of the most captivating applications Deep Learning. In this survey, we consider how Data Augmentation training strategy can aid in its development. We begin with major motifs summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, distinction between meaning form. follow these a concrete list augmentation frameworks that have been developed for text data. Learning generally struggles measurement generalization characterization overfitting. highlight studies cover augmentations construct test sets generalization. NLP at an early stage applying compared to Computer Vision. key differences promising ideas yet be tested NLP. For sake practical implementation, describe tools facilitate such as use consistency regularization, controllers, offline online pipelines, preview few. Finally, discuss interesting topics around task-specific augmentations, prior knowledge self-supervised learning versus Augmentation, intersections transfer multi-task learning, AI-GAs (AI-Generating Algorithms). hope paper inspires further research interest Text Augmentation.
منابع مشابه
Improving Deep Learning using Generic Data Augmentation
Deep artificial neural networks require a large corpus of training data in order to effectively learn, where collection of such training data is often expensive and laborious. Data augmentation overcomes this issue by artificially inflating the training set with label preserving transformations. Recently there has been extensive use of generic data augmentation to improve Convolutional Neural N...
متن کاملA Bayesian Data Augmentation Approach for Learning Deep Models
Data augmentation is an essential part of the training process applied to deep learning models. The motivation is that a robust training process for deep learning models depends on large annotated datasets, which are expensive to be acquired, stored and processed. Therefore a reasonable alternative is to be able to automatically generate new annotated training samples using a process known as d...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملEfficient Method Based on Combination of Deep Learning Models for Sentiment Analysis of Text
People's opinions about a specific concept are considered as one of the most important textual data that are available on the web. However, finding and monitoring web pages containing these comments and extracting valuable information from them is very difficult. In this regard, developing automatic sentiment analysis systems that can extract opinions and express their intellectual process has ...
متن کاملLearning Deep Sigmoid Belief Networks with Data Augmentation
Deep directed generative models are developed. The multi-layered model is designed by stacking sigmoid belief networks, with sparsity-encouraging priors placed on the model parameters. Learning and inference of layer-wise model parameters are implemented in a Bayesian setting. By exploring the idea of data augmentation and introducing auxiliary Pólya-Gamma variables, simple and efficient Gibbs ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Big Data
سال: 2021
ISSN: ['2196-1115']
DOI: https://doi.org/10.1186/s40537-021-00492-0